3,699 research outputs found
Gaussian approximation for the sup-norm of high-dimensional matrix-variate U-statistics and its applications
This paper studies the Gaussian approximation of high-dimensional and
non-degenerate U-statistics of order two under the supremum norm. We propose a
two-step Gaussian approximation procedure that does not impose structural
assumptions on the data distribution. Specifically, subject to mild moment
conditions on the kernel, we establish the explicit rate of convergence that
decays polynomially in sample size for a high-dimensional scaling limit, where
the dimension can be much larger than the sample size. We also supplement a
practical Gaussian wild bootstrap method to approximate the quantiles of the
maxima of centered U-statistics and prove its asymptotic validity. The wild
bootstrap is demonstrated on statistical applications for high-dimensional
non-Gaussian data including: (i) principled and data-dependent tuning parameter
selection for regularized estimation of the covariance matrix and its related
functionals; (ii) simultaneous inference for the covariance and rank
correlation matrices. In particular, for the thresholded covariance matrix
estimator with the bootstrap selected tuning parameter, we show that the
Gaussian-like convergence rates can be achieved for heavy-tailed data, which
are less conservative than those obtained by the Bonferroni technique that
ignores the dependency in the underlying data distribution. In addition, we
also show that even for subgaussian distributions, error bounds of the
bootstrapped thresholded covariance matrix estimator can be much tighter than
those of the minimax estimator with a universal threshold
A Note on Moment Inequality for Quadratic Forms
Moment inequality for quadratic forms of random vectors is of particular
interest in covariance matrix testing and estimation problems. In this paper,
we prove a Rosenthal-type inequality, which exhibits new features and certain
improvement beyond the unstructured Rosenthal inequality of quadratic forms
when dimension of the vectors increases without bound. Applications to test the
block diagonal structures and detect the sparsity in the high-dimensional
covariance matrix are presented.Comment: 12 pages, 0 figur
A robust bootstrap change point test for high-dimensional location parameter
We consider the problem of change point detection for high-dimensional
distributions in a location family when the dimension can be much larger than
the sample size. In change point analysis, the widely used cumulative sum
(CUSUM) statistics are sensitive to outliers and heavy-tailed distributions. In
this paper, we propose a robust, tuning-free (i.e., fully data-dependent), and
easy-to-implement change point test that enjoys strong theoretical guarantees.
To achieve the robust purpose in a nonparametric setting, we formulate the
change point detection in the multivariate -statistics framework with
anti-symmetric and nonlinear kernels. Specifically, the within-sample noise is
canceled out by anti-symmetry of the kernel, while the signal distortion under
certain nonlinear kernels can be controlled such that the between-sample change
point signal is magnitude preserving. A (half) jackknife multiplier bootstrap
(JMB) tailored to the change point detection setting is proposed to calibrate
the distribution of our -norm aggregated test statistic. Subject
to mild moment conditions on kernels, we derive the uniform rates of
convergence for the JMB to approximate the sampling distribution of the test
statistic, and analyze its size and power properties. Extensions to multiple
change point testing and estimation are discussed with illustration from
numerical studies
Inference in Kingman's Coalescent with Particle Markov Chain Monte Carlo Method
We propose a new algorithm to do posterior sampling of Kingman's coalescent,
based upon the Particle Markov Chain Monte Carlo methodology. Specifically, the
algorithm is an instantiation of the Particle Gibbs Sampling method, which
alternately samples coalescent times conditioned on coalescent tree structures,
and tree structures conditioned on coalescent times via the conditional
Sequential Monte Carlo procedure. We implement our algorithm as a C++ package,
and demonstrate its utility via a parameter estimation task in population
genetics on both single- and multiple-locus data. The experiment results show
that the proposed algorithm performs comparable to or better than several
well-developed methods
Randomized incomplete -statistics in high dimensions
This paper studies inference for the mean vector of a high-dimensional
-statistic. In the era of Big Data, the dimension of the -statistic
and the sample size of the observations tend to be both large, and the
computation of the -statistic is prohibitively demanding. Data-dependent
inferential procedures such as the empirical bootstrap for -statistics is
even more computationally expensive. To overcome such computational bottleneck,
incomplete -statistics obtained by sampling fewer terms of the -statistic
are attractive alternatives. In this paper, we introduce randomized incomplete
-statistics with sparse weights whose computational cost can be made
independent of the order of the -statistic. We derive non-asymptotic
Gaussian approximation error bounds for the randomized incomplete
-statistics in high dimensions, namely in cases where the dimension is
possibly much larger than the sample size , for both non-degenerate and
degenerate kernels. In addition, we propose generic bootstrap methods for the
incomplete -statistics that are computationally much less-demanding than
existing bootstrap methods, and establish finite sample validity of the
proposed bootstrap methods. Our methods are illustrated on the application to
nonparametric testing for the pairwise independence of a high-dimensional
random vector under weaker assumptions than those appearing in the literature
Jackknife multiplier bootstrap: finite sample approximations to the -process supremum with applications
This paper is concerned with finite sample approximations to the supremum of
a non-degenerate -process of a general order indexed by a function class. We
are primarily interested in situations where the function class as well as the
underlying distribution change with the sample size, and the -process itself
is not weakly convergent as a process. Such situations arise in a variety of
modern statistical problems. We first consider Gaussian approximations, namely,
approximate the -process supremum by the supremum of a Gaussian process, and
derive coupling and Kolmogorov distance bounds. Such Gaussian approximations
are, however, not often directly applicable in statistical problems since the
covariance function of the approximating Gaussian process is unknown. This
motivates us to study bootstrap-type approximations to the -process
supremum. We propose a novel jackknife multiplier bootstrap (JMB) tailored to
the -process, and derive coupling and Kolmogorov distance bounds for the
proposed JMB method. All these results are non-asymptotic, and established
under fairly general conditions on function classes and underlying
distributions. Key technical tools in the proofs are new local maximal
inequalities for -processes, which may be useful in other problems. We also
discuss applications of the general approximation results to testing for
qualitative features of nonparametric functions based on generalized local
-processes
Finite sample change point inference and identification for high-dimensional mean vectors
Cumulative sum (CUSUM) statistics are widely used in the change point
inference and identification. For the problem of testing for existence of a
change point in an independent sample generated from the mean-shift model, we
introduce a Gaussian multiplier bootstrap to calibrate critical values of the
CUSUM test statistics in high dimensions. The proposed bootstrap CUSUM test is
fully data-dependent and it has strong theoretical guarantees under arbitrary
dependence structures and mild moment conditions. Specifically, we show that
with a boundary removal parameter the bootstrap CUSUM test enjoys the uniform
validity in size under the null and it achieves the minimax separation rate
under the sparse alternatives when the dimension can be larger than the
sample size .
Once a change point is detected, we estimate the change point location by
maximizing the -norm of the generalized CUSUM statistics at two
different weighting scales corresponding to covariance stationary and
non-stationary CUSUM statistics. For both estimators, we derive their rates of
convergence and show that dimension impacts the rates only through logarithmic
factors, which implies that consistency of the CUSUM estimators is possible
when is much larger than . In the presence of multiple change points, we
propose a principled bootstrap-assisted binary segmentation (BABS) algorithm to
dynamically adjust the change point detection rule and recursively estimate
their locations. We derive its rate of convergence under suitable signal
separation and strength conditions.
The results derived in this paper are non-asymptotic and we provide extensive
simulation studies to assess the finite sample performance. The empirical
evidence shows an encouraging agreement with our theoretical results
Inference of high-dimensional linear models with time-varying coefficients
We propose a pointwise inference algorithm for high-dimensional linear models
with time-varying coefficients. The method is based on a novel combination of
the nonparametric kernel smoothing technique and a Lasso bias-corrected ridge
regression estimator. Due to the non-stationarity feature of the model, dynamic
bias-variance decomposition of the estimator is obtained. With a
bias-correction procedure, the local null distribution of the estimator of the
time-varying coefficient vector is characterized for iid Gaussian and
heavy-tailed errors. The limiting null distribution is also established for
Gaussian process errors, and we show that the asymptotic properties differ
between short-range and long-range dependent errors. Here, p-values are
adjusted by a Bonferroni-type correction procedure to control the familywise
error rate (FWER) in the asymptotic sense at each time point. The finite sample
size performance of the proposed inference algorithm is illustrated with
synthetic data and an application to learn brain connectivity by using the
resting-state fMRI data for Parkinson's disease
Hanson-Wright inequality in Hilbert spaces with application to -means clustering for non-Euclidean data
We derive a dimension-free Hanson-Wright inequality for quadratic forms of
independent sub-gaussian random variables in a separable Hilbert space. Our
inequality is an infinite-dimensional generalization of the classical
Hanson-Wright inequality for finite-dimensional Euclidean random vectors. We
illustrate an application to the generalized -means clustering problem for
non-Euclidean data. Specifically, we establish the exponential rate of
convergence for a semidefinite relaxation of the generalized -means, which
together with a simple rounding algorithm imply the exact recovery of the true
clustering structure
Distributed Consensus Resilient to Both Crash Failures and Strategic Manipulations
In this paper, we study distributed consensus in synchronous systems subject
to both unexpected crash failures and strategic manipulations by rational
agents in the system. We adapt the concept of collusion-resistant Nash
equilibrium to model protocols that are resilient to both crash failures and
strategic manipulations of a group of colluding agents. For a system with
distributed agents, we design a deterministic protocol that tolerates 2
colluding agents and a randomized protocol that tolerates colluding
agents, and both tolerate any number of failures. We also show that if
colluders are allowed an extra communication round after each synchronous
round, there is no protocol that can tolerate even 2 colluding agents and 1
crash failure
- β¦